Picture for Fanhu Zeng

Fanhu Zeng

Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning

Add code
May 25, 2026
Viaarxiv icon

Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction

Add code
May 18, 2026
Viaarxiv icon

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

Add code
Apr 21, 2026
Viaarxiv icon

CL-VISTA: Benchmarking Continual Learning in Video Large Language Models

Add code
Apr 01, 2026
Viaarxiv icon

Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients

Add code
Mar 18, 2026
Viaarxiv icon

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Add code
Feb 26, 2026
Viaarxiv icon

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Add code
Dec 23, 2025
Figure 1 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Figure 2 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Figure 3 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Figure 4 for VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
Viaarxiv icon

MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

Add code
Aug 10, 2025
Viaarxiv icon

A Comprehensive Survey on Continual Learning in Generative Models

Add code
Jun 16, 2025
Viaarxiv icon

Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration

Add code
Jun 06, 2025
Viaarxiv icon